Dataset statistics
| Number of variables | 24 |
|---|---|
| Number of observations | 50000 |
| Missing cells | 289 |
| Missing cells (%) | < 0.1% |
| Duplicate rows | 6615 |
| Duplicate rows (%) | 13.2% |
| Total size in memory | 9.0 MiB |
| Average record size in memory | 188.0 B |
Variable types
| CAT | 12 |
|---|---|
| NUM | 10 |
| BOOL | 2 |
| Dataset has 6615 (13.2%) duplicate rows | Duplicates |
country has a high cardinality: 154 distinct values | High cardinality |
arrival_date has a high cardinality: 793 distinct values | High cardinality |
previous_cancellations is highly skewed (γ1 = 28.90866083) | Skewed |
lead_time has 3915 (7.8%) zeros | Zeros |
stays_in_weekend_nights has 21640 (43.3%) zeros | Zeros |
stays_in_week_nights has 3818 (7.6%) zeros | Zeros |
previous_cancellations has 49619 (99.2%) zeros | Zeros |
previous_bookings_not_canceled has 47735 (95.5%) zeros | Zeros |
booking_changes has 39823 (79.6%) zeros | Zeros |
days_in_waiting_list has 49116 (98.2%) zeros | Zeros |
average_daily_rate has 1166 (2.3%) zeros | Zeros |
total_of_special_requests has 24493 (49.0%) zeros | Zeros |
Reproduction
| Analysis started | 2022-10-09 00:39:28.049113 |
|---|---|
| Analysis finished | 2022-10-09 00:39:54.336119 |
| Duration | 26.29 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
hotel
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| City_Hotel | |
|---|---|
| Resort_Hotel |
| Value | Count | Frequency (%) | |
| City_Hotel | 30752 | 61.5% | |
| Resort_Hotel | 19248 | 38.5% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 12 |
|---|---|
| Median length | 10 |
| Mean length | 10.76992 |
| Min length | 10 |
| Distinct | 414 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 80.09412 |
|---|---|
| Minimum | 0 |
| Maximum | 709 |
| Zeros | 3915 |
| Zeros (%) | 7.8% |
| Memory size | 390.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 8 |
| median | 45 |
| Q3 | 125 |
| 95-th percentile | 269 |
| Maximum | 709 |
| Range | 709 |
| Interquartile range (IQR) | 117 |
Descriptive statistics
| Standard deviation | 91.20136192 |
|---|---|
| Coefficient of variation (CV) | 1.13867737 |
| Kurtosis | 2.220205704 |
| Mean | 80.09412 |
| Median Absolute Deviation (MAD) | 42 |
| Skewness | 1.514854429 |
| Sum | 4004706 |
| Variance | 8317.688415 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 3915 | 7.8% | |
| 1 | 2046 | 4.1% | |
| 2 | 1258 | 2.5% | |
| 3 | 1142 | 2.3% | |
| 4 | 1031 | 2.1% | |
| 5 | 885 | 1.8% | |
| 6 | 831 | 1.7% | |
| 7 | 790 | 1.6% | |
| 8 | 611 | 1.2% | |
| 11 | 570 | 1.1% | |
| Other values (404) | 36921 | 73.8% |
| Value | Count | Frequency (%) | |
| 0 | 3915 | 7.8% | |
| 1 | 2046 | 4.1% | |
| 2 | 1258 | 2.5% | |
| 3 | 1142 | 2.3% | |
| 4 | 1031 | 2.1% |
| Value | Count | Frequency (%) | |
| 709 | 1 | < 0.1% | |
| 542 | 12 | < 0.1% | |
| 518 | 17 | < 0.1% | |
| 504 | 10 | < 0.1% | |
| 479 | 16 | < 0.1% |
| Distinct | 17 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.92852 |
|---|---|
| Minimum | 0 |
| Maximum | 19 |
| Zeros | 21640 |
| Zeros (%) | 43.3% |
| Memory size | 390.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 2 |
| Maximum | 19 |
| Range | 19 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 0.9962883425 |
|---|---|
| Coefficient of variation (CV) | 1.072985334 |
| Kurtosis | 10.40739741 |
| Mean | 0.92852 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.51207629 |
| Sum | 46426 |
| Variance | 0.9925904614 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 21640 | 43.3% | |
| 2 | 13840 | 27.7% | |
| 1 | 13031 | 26.1% | |
| 4 | 826 | 1.7% | |
| 3 | 564 | 1.1% | |
| 6 | 41 | 0.1% | |
| 5 | 22 | < 0.1% | |
| 8 | 18 | < 0.1% | |
| 10 | 4 | < 0.1% | |
| 7 | 2 | < 0.1% | |
| Other values (7) | 12 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 21640 | 43.3% | |
| 1 | 13031 | 26.1% | |
| 2 | 13840 | 27.7% | |
| 3 | 564 | 1.1% | |
| 4 | 826 | 1.7% |
| Value | Count | Frequency (%) | |
| 19 | 1 | < 0.1% | |
| 18 | 1 | < 0.1% | |
| 16 | 2 | < 0.1% | |
| 14 | 2 | < 0.1% | |
| 13 | 2 | < 0.1% |
| Distinct | 31 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.46454 |
|---|---|
| Minimum | 0 |
| Maximum | 50 |
| Zeros | 3818 |
| Zeros (%) | 7.6% |
| Memory size | 390.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 5 |
| Maximum | 50 |
| Range | 50 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.936176016 |
|---|---|
| Coefficient of variation (CV) | 0.7856135489 |
| Kurtosis | 31.42440699 |
| Mean | 2.46454 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 2.987397579 |
| Sum | 123227 |
| Variance | 3.748777564 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 1 | 13619 | 27.2% | |
| 2 | 12513 | 25.0% | |
| 3 | 9161 | 18.3% | |
| 5 | 4779 | 9.6% | |
| 4 | 4020 | 8.0% | |
| 0 | 3818 | 7.6% | |
| 6 | 616 | 1.2% | |
| 10 | 488 | 1.0% | |
| 7 | 481 | 1.0% | |
| 8 | 303 | 0.6% | |
| Other values (21) | 202 | 0.4% |
| Value | Count | Frequency (%) | |
| 0 | 3818 | 7.6% | |
| 1 | 13619 | 27.2% | |
| 2 | 12513 | 25.0% | |
| 3 | 9161 | 18.3% | |
| 4 | 4020 | 8.0% |
| Value | Count | Frequency (%) | |
| 50 | 1 | < 0.1% | |
| 42 | 1 | < 0.1% | |
| 41 | 1 | < 0.1% | |
| 40 | 1 | < 0.1% | |
| 35 | 1 | < 0.1% |
adults
Real number (ℝ≥0)
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.83028 |
|---|---|
| Minimum | 0 |
| Maximum | 4 |
| Zeros | 194 |
| Zeros (%) | 0.4% |
| Memory size | 390.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 2 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 4 |
| Range | 4 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.5090778966 |
|---|---|
| Coefficient of variation (CV) | 0.2781420857 |
| Kurtosis | 0.8729303839 |
| Mean | 1.83028 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -0.3993167894 |
| Sum | 91514 |
| Variance | 0.2591603048 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 2 | 36271 | 72.5% | |
| 1 | 10831 | 21.7% | |
| 3 | 2675 | 5.3% | |
| 0 | 194 | 0.4% | |
| 4 | 29 | 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 194 | 0.4% | |
| 1 | 10831 | 21.7% | |
| 2 | 36271 | 72.5% | |
| 3 | 2675 | 5.3% | |
| 4 | 29 | 0.1% |
| Value | Count | Frequency (%) | |
| 4 | 29 | 0.1% | |
| 3 | 2675 | 5.3% | |
| 2 | 36271 | 72.5% | |
| 1 | 10831 | 21.7% | |
| 0 | 194 | 0.4% |
children
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| none | |
|---|---|
| children | 4038 |
| Value | Count | Frequency (%) | |
| none | 45962 | 91.9% | |
| children | 4038 | 8.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 8 |
|---|---|
| Median length | 4 |
| Mean length | 4.32304 |
| Min length | 4 |
meal
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| BB | |
|---|---|
| HB | |
| SC | |
| Undefined | 580 |
| FB | 211 |
| Value | Count | Frequency (%) | |
| BB | 38316 | 76.6% | |
| HB | 6399 | 12.8% | |
| SC | 4494 | 9.0% | |
| Undefined | 580 | 1.2% | |
| FB | 211 | 0.4% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 9 |
|---|---|
| Median length | 2 |
| Mean length | 2.0812 |
| Min length | 2 |
| Distinct | 154 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 289 |
| Missing (%) | 0.6% |
| Memory size | 390.6 KiB |
| PRT | |
|---|---|
| GBR | |
| FRA | |
| ESP | |
| DEU | |
| Other values (149) |
| Value | Count | Frequency (%) | |
| PRT | 14046 | 28.1% | |
| GBR | 6405 | 12.8% | |
| FRA | 5627 | 11.3% | |
| ESP | 4298 | 8.6% | |
| DEU | 4047 | 8.1% | |
| IRL | 1691 | 3.4% | |
| ITA | 1607 | 3.2% | |
| BEL | 1250 | 2.5% | |
| NLD | 1123 | 2.2% | |
| USA | 1059 | 2.1% | |
| Other values (144) | 8558 | 17.1% |
Unique
| Unique | 35 ? |
|---|---|
| Unique (%) | 0.1% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 2.9866 |
| Min length | 2 |
market_segment
Categorical
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| Online_TA | |
|---|---|
| Offline_TA/TO | |
| Direct | |
| Groups | |
| Corporate | |
| Other values (2) | 549 |
| Value | Count | Frequency (%) | |
| Online_TA | 23760 | 47.5% | |
| Offline_TA/TO | 10604 | 21.2% | |
| Direct | 7131 | 14.3% | |
| Groups | 5124 | 10.2% | |
| Corporate | 2832 | 5.7% | |
| Complementary | 427 | 0.9% | |
| Aviation | 122 | 0.2% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 13 |
|---|---|
| Median length | 9 |
| Mean length | 9.14474 |
| Min length | 6 |
distribution_channel
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| TA/TO | |
|---|---|
| Direct | |
| Corporate | 3459 |
| GDS | 108 |
| Undefined | 1 |
| Value | Count | Frequency (%) | |
| TA/TO | 38349 | 76.7% | |
| Direct | 8083 | 16.2% | |
| Corporate | 3459 | 6.9% | |
| GDS | 108 | 0.2% | |
| Undefined | 1 | < 0.1% |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 9 |
|---|---|
| Median length | 5 |
| Mean length | 5.43414 |
| Min length | 3 |
is_repeated_guest
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| 0 | |
|---|---|
| 1 | 2160 |
| Value | Count | Frequency (%) | |
| 0 | 47840 | 95.7% | |
| 1 | 2160 | 4.3% |
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.01674 |
|---|---|
| Minimum | 0 |
| Maximum | 13 |
| Zeros | 49619 |
| Zeros (%) | 99.2% |
| Memory size | 390.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 13 |
| Range | 13 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.287856613 |
|---|---|
| Coefficient of variation (CV) | 17.19573554 |
| Kurtosis | 1004.413708 |
| Mean | 0.01674 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 28.90866083 |
| Sum | 837 |
| Variance | 0.08286142963 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 49619 | 99.2% | |
| 1 | 246 | 0.5% | |
| 2 | 52 | 0.1% | |
| 3 | 25 | 0.1% | |
| 11 | 20 | < 0.1% | |
| 4 | 15 | < 0.1% | |
| 5 | 13 | < 0.1% | |
| 6 | 9 | < 0.1% | |
| 13 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 49619 | 99.2% | |
| 1 | 246 | 0.5% | |
| 2 | 52 | 0.1% | |
| 3 | 25 | 0.1% | |
| 4 | 15 | < 0.1% |
| Value | Count | Frequency (%) | |
| 13 | 1 | < 0.1% | |
| 11 | 20 | < 0.1% | |
| 6 | 9 | < 0.1% | |
| 5 | 13 | < 0.1% | |
| 4 | 15 | < 0.1% |
| Distinct | 57 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.20274 |
|---|---|
| Minimum | 0 |
| Maximum | 72 |
| Zeros | 47735 |
| Zeros (%) | 95.5% |
| Memory size | 390.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 72 |
| Range | 72 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.803691093 |
|---|---|
| Coefficient of variation (CV) | 8.896572422 |
| Kurtosis | 537.4106085 |
| Mean | 0.20274 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 19.61142262 |
| Sum | 10137 |
| Variance | 3.253301558 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 47735 | 95.5% | |
| 1 | 956 | 1.9% | |
| 2 | 370 | 0.7% | |
| 3 | 210 | 0.4% | |
| 4 | 148 | 0.3% | |
| 5 | 112 | 0.2% | |
| 6 | 76 | 0.2% | |
| 7 | 50 | 0.1% | |
| 9 | 42 | 0.1% | |
| 8 | 39 | 0.1% | |
| Other values (47) | 262 | 0.5% |
| Value | Count | Frequency (%) | |
| 0 | 47735 | 95.5% | |
| 1 | 956 | 1.9% | |
| 2 | 370 | 0.7% | |
| 3 | 210 | 0.4% | |
| 4 | 148 | 0.3% |
| Value | Count | Frequency (%) | |
| 72 | 1 | < 0.1% | |
| 71 | 1 | < 0.1% | |
| 69 | 1 | < 0.1% | |
| 67 | 1 | < 0.1% | |
| 65 | 1 | < 0.1% |
reserved_room_type
Categorical
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| A | |
|---|---|
| D | |
| E | 3096 |
| F | 1299 |
| G | 899 |
| Other values (4) | 1142 |
| Value | Count | Frequency (%) | |
| A | 34889 | 69.8% | |
| D | 8675 | 17.3% | |
| E | 3096 | 6.2% | |
| F | 1299 | 2.6% | |
| G | 899 | 1.8% | |
| B | 488 | 1.0% | |
| C | 417 | 0.8% | |
| H | 235 | 0.5% | |
| L | 2 | < 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
assigned_room_type
Categorical
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| A | |
|---|---|
| D | |
| E | |
| F | 1839 |
| C | 1305 |
| Other values (5) |
| Value | Count | Frequency (%) | |
| A | 27357 | 54.7% | |
| D | 12577 | 25.2% | |
| E | 3924 | 7.8% | |
| F | 1839 | 3.7% | |
| C | 1305 | 2.6% | |
| G | 1185 | 2.4% | |
| B | 1079 | 2.2% | |
| H | 313 | 0.6% | |
| I | 239 | 0.5% | |
| K | 182 | 0.4% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
| Distinct | 19 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.29496 |
|---|---|
| Minimum | 0 |
| Maximum | 21 |
| Zeros | 39823 |
| Zeros (%) | 79.6% |
| Memory size | 390.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 2 |
| Maximum | 21 |
| Range | 21 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.7400064531 |
|---|---|
| Coefficient of variation (CV) | 2.508836632 |
| Kurtosis | 67.16072387 |
| Mean | 0.29496 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.417086108 |
| Sum | 14748 |
| Variance | 0.5476095506 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 39823 | 79.6% | |
| 1 | 7274 | 14.5% | |
| 2 | 2018 | 4.0% | |
| 3 | 523 | 1.0% | |
| 4 | 212 | 0.4% | |
| 5 | 71 | 0.1% | |
| 6 | 31 | 0.1% | |
| 7 | 13 | < 0.1% | |
| 8 | 11 | < 0.1% | |
| 9 | 6 | < 0.1% | |
| Other values (9) | 18 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 39823 | 79.6% | |
| 1 | 7274 | 14.5% | |
| 2 | 2018 | 4.0% | |
| 3 | 523 | 1.0% | |
| 4 | 212 | 0.4% |
| Value | Count | Frequency (%) | |
| 21 | 1 | < 0.1% | |
| 18 | 1 | < 0.1% | |
| 17 | 2 | < 0.1% | |
| 16 | 1 | < 0.1% | |
| 15 | 3 | < 0.1% |
deposit_type
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| No_Deposit | |
|---|---|
| Refundable | 92 |
| Non_Refund | 69 |
| Value | Count | Frequency (%) | |
| No_Deposit | 49839 | 99.7% | |
| Refundable | 92 | 0.2% | |
| Non_Refund | 69 | 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
| Distinct | 92 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.5704 |
|---|---|
| Minimum | 0 |
| Maximum | 379 |
| Zeros | 49116 |
| Zeros (%) | 98.2% |
| Memory size | 390.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 379 |
| Range | 379 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 14.79030016 |
|---|---|
| Coefficient of variation (CV) | 9.418173817 |
| Kurtosis | 200.4084546 |
| Mean | 1.5704 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 12.8526275 |
| Sum | 78520 |
| Variance | 218.7529789 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 49116 | 98.2% | |
| 58 | 114 | 0.2% | |
| 87 | 47 | 0.1% | |
| 38 | 34 | 0.1% | |
| 63 | 34 | 0.1% | |
| 122 | 33 | 0.1% | |
| 223 | 26 | 0.1% | |
| 65 | 26 | 0.1% | |
| 77 | 25 | 0.1% | |
| 176 | 22 | < 0.1% | |
| Other values (82) | 523 | 1.0% |
| Value | Count | Frequency (%) | |
| 0 | 49116 | 98.2% | |
| 1 | 7 | < 0.1% | |
| 2 | 2 | < 0.1% | |
| 4 | 9 | < 0.1% | |
| 5 | 3 | < 0.1% |
| Value | Count | Frequency (%) | |
| 379 | 4 | < 0.1% | |
| 330 | 11 | < 0.1% | |
| 259 | 7 | < 0.1% | |
| 236 | 19 | < 0.1% | |
| 224 | 2 | < 0.1% |
customer_type
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| Transient | |
|---|---|
| Transient-Party | |
| Contract | 1864 |
| Group | 363 |
| Value | Count | Frequency (%) | |
| Transient | 35343 | 70.7% | |
| Transient-Party | 12430 | 24.9% | |
| Contract | 1864 | 3.7% | |
| Group | 363 | 0.7% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 15 |
|---|---|
| Median length | 9 |
| Mean length | 10.42528 |
| Min length | 5 |
| Distinct | 6173 |
|---|---|
| Distinct (%) | 12.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 99.9423424 |
|---|---|
| Minimum | -6.38 |
| Maximum | 510 |
| Zeros | 1166 |
| Zeros (%) | 2.3% |
| Memory size | 390.6 KiB |
Quantile statistics
| Minimum | -6.38 |
|---|---|
| 5-th percentile | 35 |
| Q1 | 67.5 |
| median | 92.5 |
| Q3 | 125 |
| 95-th percentile | 191 |
| Maximum | 510 |
| Range | 516.38 |
| Interquartile range (IQR) | 57.5 |
Descriptive statistics
| Standard deviation | 49.03909248 |
|---|---|
| Coefficient of variation (CV) | 0.4906738355 |
| Kurtosis | 2.07847995 |
| Mean | 99.9423424 |
| Median Absolute Deviation (MAD) | 27.5 |
| Skewness | 0.9545451394 |
| Sum | 4997117.12 |
| Variance | 2404.832591 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 1166 | 2.3% | |
| 65 | 1104 | 2.2% | |
| 75 | 1018 | 2.0% | |
| 85 | 651 | 1.3% | |
| 90 | 648 | 1.3% | |
| 95 | 633 | 1.3% | |
| 80 | 554 | 1.1% | |
| 48 | 539 | 1.1% | |
| 115 | 536 | 1.1% | |
| 60 | 470 | 0.9% | |
| Other values (6163) | 42681 | 85.4% |
| Value | Count | Frequency (%) | |
| -6.38 | 1 | < 0.1% | |
| 0 | 1166 | 2.3% | |
| 1 | 6 | < 0.1% | |
| 1.29 | 1 | < 0.1% | |
| 1.56 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 510 | 1 | < 0.1% | |
| 508 | 1 | < 0.1% | |
| 451.5 | 1 | < 0.1% | |
| 426.25 | 1 | < 0.1% | |
| 402 | 1 | < 0.1% |
required_car_parking_spaces
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| none | |
|---|---|
| parking |
| Value | Count | Frequency (%) | |
| none | 45019 | 90.0% | |
| parking | 4981 | 10.0% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 7 |
|---|---|
| Median length | 4 |
| Mean length | 4.29886 |
| Min length | 4 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.71266 |
|---|---|
| Minimum | 0 |
| Maximum | 5 |
| Zeros | 24493 |
| Zeros (%) | 49.0% |
| Memory size | 390.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.833804311 |
|---|---|
| Coefficient of variation (CV) | 1.16998893 |
| Kurtosis | 0.9146507264 |
| Mean | 0.71266 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.084228654 |
| Sum | 35633 |
| Variance | 0.695229629 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 24493 | 49.0% | |
| 1 | 17234 | 34.5% | |
| 2 | 6679 | 13.4% | |
| 3 | 1358 | 2.7% | |
| 4 | 213 | 0.4% | |
| 5 | 23 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 24493 | 49.0% | |
| 1 | 17234 | 34.5% | |
| 2 | 6679 | 13.4% | |
| 3 | 1358 | 2.7% | |
| 4 | 213 | 0.4% |
| Value | Count | Frequency (%) | |
| 5 | 23 | < 0.1% | |
| 4 | 213 | 0.4% | |
| 3 | 1358 | 2.7% | |
| 2 | 6679 | 13.4% | |
| 1 | 17234 | 34.5% |
| Distinct | 793 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 390.6 KiB |
| 2015-12-05 | 173 |
|---|---|
| 2016-06-24 | 148 |
| 2016-05-26 | 143 |
| 2016-06-06 | 135 |
| 2017-02-25 | 133 |
| Other values (788) |
| Value | Count | Frequency (%) | |
| 2015-12-05 | 173 | 0.3% | |
| 2016-06-24 | 148 | 0.3% | |
| 2016-05-26 | 143 | 0.3% | |
| 2016-06-06 | 135 | 0.3% | |
| 2017-02-25 | 133 | 0.3% | |
| 2015-10-02 | 128 | 0.3% | |
| 2017-05-25 | 124 | 0.2% | |
| 2015-10-15 | 122 | 0.2% | |
| 2015-10-19 | 120 | 0.2% | |
| 2016-03-24 | 119 | 0.2% | |
| Other values (783) | 48655 | 97.3% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
dummy_children
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 195.3 KiB |
| 0 | |
|---|---|
| 1 | 4038 |
| Value | Count | Frequency (%) | |
| 0 | 45962 | 91.9% | |
| 1 | 4038 | 8.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| hotel | lead_time | stays_in_weekend_nights | stays_in_week_nights | adults | children | meal | country | market_segment | distribution_channel | is_repeated_guest | previous_cancellations | previous_bookings_not_canceled | reserved_room_type | assigned_room_type | booking_changes | deposit_type | days_in_waiting_list | customer_type | average_daily_rate | required_car_parking_spaces | total_of_special_requests | arrival_date | dummy_children | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | City_Hotel | 217 | 1 | 3 | 2 | none | BB | DEU | Offline_TA/TO | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient-Party | 80.75 | none | 1 | 2016-09-01 | 0 |
| 1 | City_Hotel | 2 | 0 | 1 | 2 | none | BB | PRT | Direct | Direct | 0 | 0 | 0 | D | K | 0 | No_Deposit | 0 | Transient | 170.00 | none | 3 | 2017-08-25 | 0 |
| 2 | Resort_Hotel | 95 | 2 | 5 | 2 | none | BB | GBR | Online_TA | TA/TO | 0 | 0 | 0 | A | A | 2 | No_Deposit | 0 | Transient | 8.00 | none | 2 | 2016-11-19 | 0 |
| 3 | Resort_Hotel | 143 | 2 | 6 | 2 | none | HB | ROU | Online_TA | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient | 81.00 | none | 1 | 2016-04-26 | 0 |
| 4 | Resort_Hotel | 136 | 1 | 4 | 2 | none | HB | PRT | Direct | Direct | 0 | 0 | 0 | F | F | 0 | No_Deposit | 0 | Transient | 157.60 | none | 4 | 2016-12-28 | 0 |
| 5 | City_Hotel | 67 | 2 | 2 | 2 | none | SC | GBR | Online_TA | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient | 49.09 | none | 1 | 2016-03-13 | 0 |
| 6 | Resort_Hotel | 47 | 0 | 2 | 2 | children | BB | ESP | Direct | Direct | 0 | 0 | 0 | C | C | 0 | No_Deposit | 0 | Transient | 289.00 | none | 1 | 2017-08-23 | 1 |
| 7 | City_Hotel | 56 | 0 | 3 | 0 | children | BB | ESP | Online_TA | TA/TO | 0 | 0 | 0 | B | A | 0 | No_Deposit | 0 | Transient | 82.44 | none | 1 | 2016-12-08 | 1 |
| 8 | City_Hotel | 80 | 0 | 4 | 2 | none | BB | FRA | Online_TA | TA/TO | 0 | 0 | 0 | D | D | 0 | No_Deposit | 0 | Transient | 135.00 | none | 1 | 2017-05-02 | 0 |
| 9 | City_Hotel | 6 | 2 | 2 | 2 | children | BB | FRA | Online_TA | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient | 180.00 | none | 1 | 2016-08-07 | 1 |
Last rows
| hotel | lead_time | stays_in_weekend_nights | stays_in_week_nights | adults | children | meal | country | market_segment | distribution_channel | is_repeated_guest | previous_cancellations | previous_bookings_not_canceled | reserved_room_type | assigned_room_type | booking_changes | deposit_type | days_in_waiting_list | customer_type | average_daily_rate | required_car_parking_spaces | total_of_special_requests | arrival_date | dummy_children | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 49990 | Resort_Hotel | 283 | 2 | 8 | 2 | none | BB | GBR | Offline_TA/TO | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Contract | 72.20 | none | 1 | 2017-06-29 | 0 |
| 49991 | Resort_Hotel | 197 | 2 | 8 | 2 | none | Undefined | GBR | Offline_TA/TO | TA/TO | 0 | 0 | 0 | D | D | 1 | No_Deposit | 0 | Transient | 114.90 | none | 0 | 2016-06-01 | 0 |
| 49992 | City_Hotel | 414 | 0 | 2 | 2 | none | HB | DEU | Groups | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient-Party | 122.40 | none | 1 | 2017-07-13 | 0 |
| 49993 | City_Hotel | 225 | 2 | 4 | 2 | none | BB | BRA | Online_TA | TA/TO | 0 | 0 | 1 | A | A | 0 | No_Deposit | 0 | Group | 70.03 | none | 1 | 2016-10-20 | 0 |
| 49994 | City_Hotel | 73 | 0 | 2 | 2 | none | SC | FRA | Online_TA | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient | 79.20 | none | 1 | 2017-01-27 | 0 |
| 49995 | Resort_Hotel | 172 | 0 | 2 | 2 | children | BB | PRT | Direct | Direct | 0 | 0 | 0 | A | A | 1 | No_Deposit | 0 | Transient | 73.39 | none | 1 | 2016-10-07 | 1 |
| 49996 | Resort_Hotel | 48 | 0 | 4 | 2 | none | FB | PRT | Direct | Direct | 0 | 0 | 0 | A | B | 2 | No_Deposit | 0 | Transient | 158.00 | none | 0 | 2015-09-01 | 0 |
| 49997 | City_Hotel | 155 | 0 | 4 | 2 | none | BB | DEU | Offline_TA/TO | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient | 82.50 | none | 1 | 2017-07-26 | 0 |
| 49998 | Resort_Hotel | 140 | 2 | 5 | 2 | none | HB | GBR | Direct | Direct | 0 | 0 | 0 | G | G | 0 | No_Deposit | 0 | Transient | 143.00 | none | 0 | 2016-04-28 | 0 |
| 49999 | City_Hotel | 12 | 2 | 1 | 2 | none | BB | DEU | Online_TA | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient | 171.33 | none | 1 | 2016-09-18 | 0 |
Most frequent
| hotel | lead_time | stays_in_weekend_nights | stays_in_week_nights | adults | children | meal | country | market_segment | distribution_channel | is_repeated_guest | previous_cancellations | previous_bookings_not_canceled | reserved_room_type | assigned_room_type | booking_changes | deposit_type | days_in_waiting_list | customer_type | average_daily_rate | required_car_parking_spaces | total_of_special_requests | arrival_date | dummy_children | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1413 | City_Hotel | 134 | 0 | 1 | 1 | none | BB | PRT | Groups | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient-Party | 75.00 | none | 0 | 2017-02-25 | 0 | 36 |
| 2012 | City_Hotel | 377 | 0 | 2 | 2 | none | HB | DEU | Offline_TA/TO | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient-Party | 115.00 | none | 1 | 2016-10-14 | 0 | 35 |
| 1980 | City_Hotel | 320 | 0 | 2 | 2 | none | HB | DEU | Offline_TA/TO | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient-Party | 115.00 | none | 1 | 2016-08-18 | 0 | 34 |
| 747 | City_Hotel | 48 | 0 | 2 | 1 | none | BB | ESP | Groups | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient-Party | 65.00 | none | 1 | 2017-02-22 | 0 | 33 |
| 1759 | City_Hotel | 213 | 1 | 3 | 1 | none | HB | PRT | Groups | TA/TO | 0 | 0 | 0 | A | A | 1 | No_Deposit | 0 | Transient-Party | 104.00 | none | 0 | 2017-08-28 | 0 | 33 |
| 1872 | City_Hotel | 257 | 0 | 2 | 2 | none | HB | PRT | Offline_TA/TO | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient | 101.50 | none | 0 | 2015-07-01 | 0 | 32 |
| 2023 | City_Hotel | 405 | 0 | 2 | 2 | none | HB | DEU | Offline_TA/TO | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient-Party | 114.40 | none | 0 | 2017-07-04 | 0 | 32 |
| 940 | City_Hotel | 69 | 2 | 1 | 2 | none | BB | PRT | Groups | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 58 | Transient-Party | 85.67 | none | 0 | 2015-10-25 | 0 | 30 |
| 1859 | City_Hotel | 256 | 0 | 2 | 2 | none | HB | DEU | Offline_TA/TO | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient-Party | 115.00 | none | 1 | 2016-06-15 | 0 | 30 |
| 2030 | City_Hotel | 414 | 0 | 2 | 2 | none | HB | DEU | Groups | TA/TO | 0 | 0 | 0 | A | A | 0 | No_Deposit | 0 | Transient-Party | 122.40 | none | 1 | 2017-07-13 | 0 | 29 |